heterogeneous data batch
Linear Regression using Heterogeneous Data Batches
In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and important manifestations where the output is a noisy linear combination of the inputs, and there are k subgroups, each with its own regression vector. Prior work [KSS 20] showed that with abundant small-batches, the regression vectors can be learned with only few, \tilde\Omega( k {3/2}), batches of medium-size with \tilde\Omega(\sqrt k) samples each. However, the paper requires that the input distribution for all k subgroups be isotropic Gaussian, and states that removing this assumption is an interesting and challenging problem".